82 research outputs found

    RepSeq-A database of amino acid repeats present in lower eukaryotic pathogens

    Get PDF
    BACKGROUND Amino acid repeat-containing proteins have a broad range of functions and their identification is of relevance to many experimental biologists. In human-infective protozoan parasites (such as the Kinetoplastid and Plasmodium species), they are implicated in immune evasion and have been shown to influence virulence and pathogenicity. RepSeq http://repseq.gugbe.com is a new database of amino acid repeat-containing proteins found in lower eukaryotic pathogens. The RepSeq database is accessed via a web-based application which also provides links to related online tools and databases for further analyses. RESULTS The RepSeq algorithm typically identifies more than 98% of repeat-containing proteins and is capable of identifying both perfect and mismatch repeats. The proportion of proteins that contain repeat elements varies greatly between different families and even species (3 - 35% of the total protein content). The most common motif type is the Sequence Repeat Region (SRR) - a repeated motif containing multiple different amino acid types. Proteins containing Single Amino Acid Repeats (SAARs) and Di-Peptide Repeats (DPRs) typically account for 0.5 - 1.0% of the total protein number. Notable exceptions are P. falciparum and D. discoideum, in which 33.67% and 34.28% respectively of the predicted proteomes consist of repeat-containing proteins. These numbers are due to large insertions of low complexity single and multi-codon repeat regions. CONCLUSION The RepSeq database provides a repository for repeat-containing proteins found in parasitic protozoa. The database allows for both individual and cross-species proteome analyses and also allows users to upload sequences of interest for analysis by the RepSeq algorithm. Identification of repeat-containing proteins provides researchers with a defined subset of proteins which can be analysed by expression profiling and functional characterisation, thereby facilitating study of pathogenicity and virulence factors in the parasitic protozoa. While primarily designed for kinetoplastid work, the RepSeq algorithm and database retain full functionality when used to analyse other species

    SSR marker-based DNA fingerprinting of Sub1 introgressed lines in the background of traditional rice varieties of Assam India

    Get PDF
    350-356Rice varieties are usually characterized by agro-morphological descriptors used for seed certification and seed characterization by following distinctiveness, uniformity, and stability (DUS) test. But in fact, these primary distinguishing morphological descriptors among rice varieties are very limited and hence face problems to distinguish germplasm accessions. Germplasm certification in NBPGR requires a DNA fingerprinting profile to explain germplasm uniqueness compared to existing varieties. Varietal identification has gained a key role worldwide, particularly in plant variety protection. Sixty-two morphological descriptors studies have shown the Sub1 introgressed advanced lines E-6, C-210, C-196, 1189-1 and 1160-1 are distinct from the other varieties for more than 15morphological traits, based on these variations the lines were selected for DNA fingerprinting. About68 SSRs markers were used for DNA fingerprinting in seven genotypes, two of which were parents (Ranjit, Bahadur) and three Sub1 introgressed advanced lines (E6, C210, C196) in Ranjit background, and two Sub1 introgressed advanced lines (1189-1, 1160-1) in Bahadur background. DNA fingerprinting was done on these genotypes of rice using SSR markers. Among the 68 SSR markers, total 65 markers were amplified and three were found not amplified. Out of 65 markersfour of them viz. RM 152, RM 172, RM 251, and RM 346 showed better polymorphism with amplicon size ranges from 155-163 bp, 150-159 bp, 137-147 bp, and 166-175 bp, respectively, and remaining 61 showed monomorphic amplification. Therefore, SSR (Simple-sequence repeats) based DNA fingerprinting helped to differentiate Ranjit, Bahadur, E-6, C-210, C-196, 1189-1, and 1160-1. Hence, the research reveals that newly developed high-yielding Sub1 introgressed advanced lines in the background of traditional Assam rice varieties (Ranjit and Bahadur) are unique in their identity

    XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biological sequence repeats arranged in tandem patterns are widespread in DNA and proteins. While many software tools have been designed to detect DNA tandem repeats (TRs), useful algorithms for identifying protein TRs with varied levels of degeneracy are still needed.</p> <p>Results</p> <p>To address limitations of current repeat identification methods, and to provide an efficient and flexible algorithm for the detection and analysis of TRs in protein sequences, we designed and implemented a new computational method called XSTREAM. Running time tests confirm the practicality of XSTREAM for analyses of multi-genome datasets. Each of the key capabilities of XSTREAM (e.g., merging, nesting, long-period detection, and TR architecture modeling) are demonstrated using anecdotal examples, and the utility of XSTREAM for identifying TR proteins was validated using data from a recently published paper.</p> <p>Conclusion</p> <p>We show that XSTREAM is a practical and valuable tool for TR detection in protein and nucleotide sequences at the multi-genome scale, and an effective tool for modeling TR domains with diverse architectures and varied levels of degeneracy. Because of these useful features, XSTREAM has significant potential for the discovery of naturally-evolved modular proteins with applications for engineering novel biostructural and biomimetic materials, and identifying new vaccine and diagnostic targets.</p

    Machine Learning Methods for Prediction of CDK-Inhibitors

    Get PDF
    Progression through the cell cycle involves the coordinated activities of a suite of cyclin/cyclin-dependent kinase (CDK) complexes. The activities of the complexes are regulated by CDK inhibitors (CDKIs). Apart from its role as cell cycle regulators, CDKIs are involved in apoptosis, transcriptional regulation, cell fate determination, cell migration and cytoskeletal dynamics. As the complexes perform crucial and diverse functions, these are important drug targets for tumour and stem cell therapeutic interventions. However, CDKIs are represented by proteins with considerable sequence heterogeneity and may fail to be identified by simple similarity search methods. In this work we have evaluated and developed machine learning methods for identification of CDKIs. We used different compositional features and evolutionary information in the form of PSSMs, from CDKIs and non-CDKIs for generating SVM and ANN classifiers. In the first stage, both the ANN and SVM models were evaluated using Leave-One-Out Cross-Validation and in the second stage these were tested on independent data sets. The PSSM-based SVM model emerged as the best classifier in both the stages and is publicly available through a user-friendly web interface at http://bioinfo.icgeb.res.in/cdkipred

    Japanese Encephalitis—A Pathological and Clinical Perspective

    Get PDF
    Japanese encephalitis (JE) is the leading form of viral encephalitis in Asia. It is caused by the JE virus (JEV), which belongs to the family Flaviviridae. JEV is endemic to many parts of Asia, where periodic outbreaks take hundreds of lives. Despite the catastrophes it causes, JE has remained a tropical disease uncommon in the West. With rapid globalization and climatic shift, JEV has started to emerge in areas where the threat was previously unknown. Scientific evidence predicts that JEV will soon become a global pathogen and cause of worldwide pandemics. Although some research documents JEV pathogenesis and drug discovery, worldwide awareness of the need for extensive research to deal with JE is still lacking. This review focuses on the exigency of developing a worldwide effort to acknowledge the prime importance of performing an extensive study of this thus far neglected tropical viral disease. This review also outlines the pathogenesis, the scientific efforts channeled into develop a therapy, and the outlook for a possible future breakthrough addressing this killer disease

    A proteomics approach to decipher the molecular nature of planarian stem cells

    Get PDF
    Background In recent years, planaria have emerged as an important model system for research into stem cells and regeneration. Attention is focused on their unique stem cells, the neoblasts, which can differentiate into any cell type present in the adult organism. Sequencing of the Schmidtea mediterranea genome and some expressed sequence tag projects have generated extensive data on the genetic profile of these cells. However, little information is available on their protein dynamics. Results We developed a proteomic strategy to identify neoblast-specific proteins. Here we describe the method and discuss the results in comparison to the genomic high-throughput analyses carried out in planaria and to proteomic studies using other stem cell systems. We also show functional data for some of the candidate genes selected in our proteomic approach. Conclusions We have developed an accurate and reliable mass-spectra-based proteomics approach to complement previous genomic studies and to further achieve a more accurate understanding and description of the molecular and cellular processes related to the neoblasts

    Regulation of inflammation in Japanese encephalitis

    Get PDF
    Uncontrolled inflammatory response of the central nervous system is a hallmark of severe Japanese encephalitis (JE). Although inflammation is necessary to mount an efficient immune response against virus infections, exacerbated inflammatory response is often detrimental. In this context, cells of the monocytic lineage appear to be important forces driving JE pathogenesis

    Prodepth: Predict Residue Depth by Support Vector Regression Approach from Protein Sequences Only

    Get PDF
    Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. Accurate prediction of RD has many potentially important applications in the field of structural bioinformatics, for example, facilitating the identification of functionally important residues, or residues in the folding nucleus, or enzyme active sites from sequence information. In this work, we introduce an efficient approach that uses support vector regression to quantify the relationship between RD and protein sequence. We systematically investigated eight different sequence encoding schemes including both local and global sequence characteristics and examined their respective prediction performances. For the objective evaluation of our approach, we used 5-fold cross-validation to assess the prediction accuracies and showed that the overall best performance could be achieved with a correlation coefficient (CC) of 0.71 between the observed and predicted RD values and a root mean square error (RMSE) of 1.74, after incorporating the relevant multiple sequence features. The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally. We highlight two examples as a comparison in order to illustrate the applicability of this approach. We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling. This method might prove to be a powerful tool for sequence analysis

    TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

    Get PDF
    Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/
    corecore